Online Supplement to “The Knowledge-Gradient Policy for Correlated Normal Beliefs”

نویسندگان

  • Peter Frazier
  • Warren Powell
  • Savas Dayanik
چکیده

As discussed in Section 3 of the main paper, the KG policy posseses several optimality and convergence properties. First, it is optimal by construction when N = 1 (Remark 1). Second, the suboptimality gap between the values of the KG and the optimal policies narrows to 0 as N →∞ (Theorem 4). This is a convergence result, since it shows that when sampling under the KG policy we are guaranteed to eventually discover the alternative that is truly best. Third, the suboptimality gap is bounded for N between these two extremes (Theorem 5). Here, we discuss and prove these latter two results, discussing the convergence result in Section A.2, and the general bound on suboptimality in Section A.3. These results extend those proved in Frazier et al. (2008) for independent normal priors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The knowledge gradient algorithm for online learning

We derive a one-period look-ahead policy for finiteand infinite-horizon online optimal learning problems with Gaussian rewards. The resulting decision rule easily extends to a variety of settings, including the case where our prior beliefs about the rewards are correlated. Experiments show that the KG policy performs competitively against other learning policies in diverse situations. In the ca...

متن کامل

The Knowledge-Gradient Policy for Correlated Normal Beliefs

We consider a Bayesian ranking and selection problem with independent normal rewards and a correlated multivariate normal belief on the mean values of these rewards. Because this formulation of the ranking and selection problem models dependence between alternatives’ mean values, algorithms may utilize this dependence to perform efficiently even when the number of alternatives is very large. We...

متن کامل

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

We derive a one-period look-ahead policy for finiteand infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multi-armed bandit methods. Experiments show that our KG policy performs competitively against the best known approximation to the opti...

متن کامل

The effect of language complexity and group size on knowledge construction: Implications for online learning

This  study  investigated  the  effect  of  language  complexity  and  group  size  on  knowledge construction in two online debates. Knowledge construction was assessed using Gunawardena et al.’s Interaction Analysis Model (1997). Language complexity was determined by dividing the  number  of  unique  words  by  total  words.  It  refers  to  the  lexical  variation.  The  results showed  that...

متن کامل

Optimal learning for sequential sampling with non-parametric beliefs

We propose a sequential learning policy for ranking and selection problems, where we use a non-parametric procedure for estimating the value of a policy. Our estimation approach aggregates over a set of kernel functions in order to achieve a more consistent estimator. Each element in the kernel estimation set uses a di erent bandwidth to achieve better aggregation. The nal estimate uses a weigh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008